5 Steps to Build a Question Answering PDF Chatbot: LangChain + OpenAI + Panel + HuggingFace.

Sophia Yang

Просмотров 46 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 11 дек 2024

Комментарии • 114

@alexkim1919 Год назад ⁺³
I didn't know about Panel... thanks. for sharing that.
@geraldofrancisco5206 Год назад
Your video is the very best of the kind, clear, complete, consise and assumes that the viewer knows nothing (I surely don't). thank you VERY much, for this.
@SophiaYangDS Год назад ⁺¹
Thanks so much for your support 🙏😊
@ubaisalih2987 Год назад
best video on youtube explaining PDF Q/A along with OpenAI and creating a final App, thank you very much
@SophiaYangDS Год назад
Thanks so much 🙏😊
@alokrajsidhaarth7130 Год назад ⁺¹
great video!
@MachineMinds_AI Год назад ⁺¹
Thanks for sharing Sophia, added to our playlist.
@SophiaYangDS Год назад
Thanks 🙏😊
@paulmuscat2770 Год назад
Thanks!
@SophiaYangDS Год назад
Thanks so much 🙏❤ my first super thanks! Really appreciate it!
@rafacanseco Год назад ⁺⁴
sophia, this was a perfect video, everything is super clear. i subscribe asap to your channel and right now I'm reading the python panel app on your medium. Your content is amazing. Congratulations.
@SophiaYangDS Год назад ⁺²
Thanks so much 🙏😊
@CinematicAdventureOne Год назад ⁺¹
Wow, thanks Sophia! This is exactly something I was looking for, nice tutorial and explanation.
@SophiaYangDS Год назад ⁺¹
Thanks so much 🙏 glad it helped 😊
@LucesLab Год назад
Great video Sophia. Thanks for sharing !!
@SophiaYangDS Год назад
Thanks so much 🙏
@jornjat Год назад
Thanks a lot for sharing your wisdom! Hopefully will be Sophia-enabled to make use of LangChain in a project...
@SophiaYangDS Год назад
Thanks so much for the support!
@1littlecoder Год назад ⁺¹
Great Video . Definitely gives me FOMO of not exploring Panel much :)
@SophiaYangDS Год назад
You could try it today 😊 Panel is the best
@danasugu1767 Год назад ⁺²
Thanks, Sophia. Can you make a video on using Karpathy' NanoGPT instead of OpenAi? A question answer pdf using LangChain + NanoGPT
@TheBontenbal Год назад
Great video, Sophia!
@RedCloudServices Год назад ⁺²
Sophia can you make a video and show how to bind the response with a Panel visualization? and store your X/Y values as variables based on user data or inputs?
@yohanzeta Год назад ⁺⁵
Are there any limitations to this as far as number of pages
@anujcb Год назад
awesome tutorial by the way
@pablonavarro6263 Год назад
super cool vided you're very helpful,
@marcinkrol633 Год назад
Great video editing, thanks for q tutorial
@SophiaYangDS Год назад
Thanks 🙏😊
@adilmajeed8439 Год назад
Thanks for sharing the working model
@SophiaYangDS Год назад
Thank you 🙏
Год назад
Perfeito! Muito obrigado, 💯Sophia Yang!
@andfanilo Год назад ⁺²
Very enjoyable video :D love the Panel + QA app!
also I don't why but with some very soft lofi jazz lounge coffee music in the background, I can definitely see this as a coding livestream 😁
@SophiaYangDS Год назад ⁺¹
haha thanks! I was too tired to add any music. Great suggestion though.
@andfanilo Год назад
@@SophiaYangDS I think it's fine to not put music here, some people don't like music for purely educational content 🙂 for a chill reading/coding livestream though, that'd be dope
Rest well in between all the work committments. 화이팅 ! (I've been watching too much kdrama recently XD)
@SophiaYangDS Год назад ⁺¹
@@andfanilo 화이팅!
@happyday.mjohnson Год назад ⁺¹
I tried this on my health plan. I asked what the deductible amount was. I got all sorts of answers but not the exact amount until I told it to look in the Benefits Details section to find it. Then it answered correctly. I was hoping I could finally figure out how much every "touch" to my health/health plan would cost and what options are best...thoughts on tuning would be awesome. I'll keep plodding along. Thank you for this video.
@jaewoochung4954 Год назад ⁺¹
Hi Sophia,
Thanks for the great video. You mentioned that you don't have to use OpenAI as the LLM in the RetrievalQA step. I've been messing around with HuggingFace and none of the models I've used have been able to even spit out a coherent or correct answer. Do you have any recommendations for a HuggingFace model that can at least someone replicate OpenAI's performance with this task? I'm definitely missing something here. Thanks so much.
P.S. Part of the problem I've had is that I'm not sure which of OpenAI's models we're calling since the code is just llm=OpenAI(), so I don't know what types of models to look for. I've tried text generation and text2text generation, but they don't work well, and question answering models don't give a the human-like response I am looking for.
@mustafadabah7377 Год назад
hi !, i have the same issue, did you found any solution or a good model in hugging face can help instead of open ai ?
@arthurperini Год назад
Great video thank you. Is really necessary use langchain to do that? I was building a chatbot, but gave up using langchain because using openAi functions example works and spent so many tokens without use the long prompt with the langchain
@rudolfbumm8126 Год назад
Great content. Can you comment on how to programmatically predict or estimate the charge for each question asked?
@fuzetea7938 Год назад
🎯 Key Takeaways for quick navigation:
Made with HARPA AI
@atombarako Год назад ⁺¹
Thanks for the great video tutorial. I ran into 2 problems:
1. Python 3.11 will not install chromadb. Downgraded to 3.10 works.
2. file_input.save("/.cache/temp.pdf") does not work. ChatGPT helped me to solve the problem:
current_dir = os.getcwd()
cache_dir = os.path.join(current_dir, '.cache')
if not os.path.exists(cache_dir):
os.makedirs(cache_dir)
pdf_file = os.path.join(cache_dir, 'temp.pdf')
Then change the line to: file_input.save(pdf_file)
@SophiaYangDS Год назад
Thanks for pointing it out! Yeah I created the .cache directory and gave permissions in the docker file when host on Hugging Face Space. To run it locally, you can change ".cache/temp.pdf" to "temp.pdf".
@oguzhanzobar1094 Год назад
@@SophiaYangDS Yes! I was encountering the same issue. I will try both of these.
@emmanuelkolawole6720 Год назад
Also, where did you specify the chatbot you want to use for the embedding? E.g GPT 3 turbo, GPT 4 , etc
@SophiaYangDS Год назад
You can define "llm=xxx" and "embedding=xxx" in the qa function
@teesimshu2971 Год назад
If we using fine-tuned model, so we still need to upload PDF for each time we query the PDF file with using same GPT API account?
@VenkatesanVenkat-fd4hg Год назад
Thanks for your valuable video. How to do boolean qa response any suggestions...
@SophiaYangDS Год назад ⁺¹
You can specify it in the prompt
@ronakdinesh Год назад
Great thank you for the video. Is there a way that I can load multiple pdfs?
@SophiaYangDS Год назад
Yes, you can write list all files in a list and use a for loop to load multiple PDFs
@parekhnikunj 8 месяцев назад
How do you compare streamlit vs panel?
@davidwu3247 Год назад
thank you for the video!
Do you think it'd be possible to edit the PDF based on user responses (ie input new data) and then output a new PDF file?
@SophiaYangDS Год назад
Yes, you just need to define a function to change the content and save to a new file.
@JCSantiago Год назад
Also for some reason when I upload the pdf it won't load the content in the box. I don't see any errors. How can I fix this?
@ScottTaylor-ir6kv Год назад
So, for a Solution such as this, how would one account for the fact that the PDF(s) could contain Personal Information such as may HIPPA (health informaion) or maybe financial information, so from a Security perspective - how do handle or account for that..? I assume that the contents are uploaded to Cloud so it would be exposed and at risk...yes?
@SophiaYangDS Год назад
Yeah it is a concern for sure. OpenAI sees all the data, that's why some people prefer to use a local model. Speaking of the cloud, I think the government uses cloud also, so private info is likely already on the cloud 😅
@denniskampien987 Год назад
Can it hold contextual memory for across all the document or just for the text chunks it receives after semantic search?
@SophiaYangDS Год назад
There are multiple ways to do question answering. Check out my previous video: ruclips.net/video/DXmiJKrQIvg/видео.html. In this case, the language model only sees relevant text. You can pass in all the text to the language model as well. It will just cost a lot of money.
@pierredsa6809 Год назад
Great video thanks !
I was wondering, would it be possible to make an example of a csv_agent with memory?
I tried with
agent = create_csv_agent(OpenAI(temperature=0), 'toto.csv' , pandas_kwargs={'sep': ";"}, verbose=True)
AgentExecutor.from_agent_and_tools(agent, tools, verbose=True, memory=memory)
but the constructor fails
@ananayaggarwal7909 Год назад
After 12.06 what did you do in black screen
@SophiaYangDS Год назад
I opened the app address (localhost:5006/LangChain_QA_Panel_App) in the browser
@sandeepsaha Год назад
How do you change the model? - say I want to use GPT-4.
@SophiaYangDS Год назад
You can define llm=xxx. Check out my previous video where I went though using a few different models. I don't have access to GPT4, so haven't tried GPT4 yet ruclips.net/video/kmbS6FDQh7c/видео.html
@anujcb Год назад
if i have to load the pdf files from a google drive, how can we do that?
@8eck Год назад
How long it takes to generate an answer?
@jcrsantiago Год назад ⁺¹
How can I load an entire folder instead of a single pdf?
@SophiaYangDS Год назад ⁺¹
Yes input_documents can be a list. You can write a for loop to loop though all the files in a folder
@MusaTalhaUnal Год назад ⁺¹
@@SophiaYangDS Can I load multiple pdfs from panel upload in localhost directly? Or should I create a directory first, and list pdfs in the list? Can you write an example code about it? Thanks for everything. It is really useful tool.
@massibob2004 Год назад
Hello
I don't understand what is the limit of number of files as inputs ?
@jameslin7457 Год назад
how can I use gpt 3.5 specifically as a language model? the one you shown is just openAI, which model does it use when it’s not specified ?
@SophiaYangDS Год назад
Yes, you can specify llm = ChatOpenAI(model_name='gpt-3.5-turbo'). Check out the first part of my LangChain intro video on how to use LangChain with many different model providers: ruclips.net/video/kmbS6FDQh7c/видео.html
@boriskozel3094 Год назад
@@SophiaYangDS may be OpenAI, not ChatOpenAI?
@anujcb Год назад
I tried to host the same code in hugging face, but getting the below error.
failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "panel": executable file not found in $PATH: unknown
@SophiaYangDS Год назад
I'm not sure. Did you duplicate the space?
@anujcb Год назад
@@SophiaYangDS , never mind, that was a silly spelling mistake. now its is finding the libs and building it. Thank you though!
@chrischen7451 Год назад
Great video, which country are you at right now?
@fixelheimer3726 Год назад
Is there a way to run this kind of method with Langchain without using online services?
@jackleung0748 Год назад
You can use the code on vs code with python
@fgfanta Год назад
The "Code in this video" linked in the video description cannot be accessed.
@aleksandarmilivojevic8641 Год назад
Thanks for the video, I tried with your online app, but for some reason I get IndexError('list index out of range'). Why is that? Am I doing something wrong? I uploaded pdf 3 pages, put my openai key and put 1 for chunks and clicked "run".
_
@anujcb Год назад
is it possible modify this to take the file(s) for a google drive folder?
@SophiaYangDS Год назад
Yes LangChain has a Google Drive loader python.langchain.com/en/latest/modules/indexes/document_loaders/examples/googledrive.html
@KemalCanKara Год назад
hi, thank you for this great video. Can ask u smth? What can be max length of a pdf in pages for this kind of job if we use lets say gpt-4?
@kamiartik6479 Год назад
There is no limit because the similarity calculation is happening outside of GPT. Once the relevant information is found in the document, it is fed into GPT. So as long as the chunk size is less than the GPT-4 limit (It is better not to use big chunks as well), you can use documents with any length.
@shubhambaghel219 Год назад
It is also giving some response out of context(pdf). have anyone faced the same thing or observed this thing.if yes how can we stop this, i have tried prompts with chain but that didn't work.
@SophiaYangDS Год назад
In the code, I asked it to output the answer and the relevant chunks of text. Is that what you see? You can remove the relevant chunks of text (return_source_documents=False and others).
@shubhambaghel219 Год назад
@@SophiaYangDS No, I think you misunderstood my question. I am seeing that model is giving answer to questions which are not in pdf documents. ea. who is elon musk. model is giving this answer using GPT trained knowledge instead of using pdfs only. I want to restrict this thing.
@surajkhan5834 Год назад
Can we do this in nodejs
@peeturpain9379 Год назад
Great video. Since I dont have a paid subscription of openAIs api, can you please detail how I can use other models, from hugging face to be able to replicate this? Especially the llama 2 model. Thanks.
@mertozlutiras Год назад ⁺¹
I think in the video it's not obvious but embedding all chunks when you first load the document should take some time
@user-wr4yl7tx3w Год назад
can we try with another LLM besides OpenAI, given the cost?
@SophiaYangDS Год назад
Yes you can change llm in the code
@christiancarpinelli Год назад
hi Sophia, excellent video as always and big fan of your content, I’m sure you’ll grow a lot in the YT tech space.
I would like to have your thoughts on this one, I'm building a chatbot that helps my users to get informations about a functionality and execute some actions via API... I was thinking to have the GPT-3.5-turbo ChatAPI as "orchestrator", and if the user wants to get informations redirect the request to a query on a vector DB for getting useful chunks of info and feed those to GPT-4 and get an appropriate response to the user question, and if instead the user wants to execute an action, redirect the request to GPT-4 and the LangChain OpenAPI Agent to execute it and return the result to the user.
What do you think about this approach? Any suggestions?
@SophiaYangDS Год назад
Thanks so much for the support! Appreciate it! I'm actually not sure if I'm following your idea. Are you in the LangChain Discord? Might be a good place to get feedback on your ideas : )
@christiancarpinelli Год назад
@@SophiaYangDS yes I am on the Discord! but I didn’t get much feedback and wanted to hear your thoughts on it.
Let me explain better. Basically I want to create a ChatBot that uses the ChatGPT API, this chatbot needs to be able to support normal conversations, but with also the capability to respond using internal documents (this part is pretty clear, you made excellent tutorials on that) but this chatbot also allows the user to interact with some APIs of the platform… now, this part also is pretty clear, but my issue is on integrate in a single Chat experience this two use cases. Hope that I made it clearer, and thank you for the response!
@SophiaYangDS Год назад
@@christiancarpinelli sounds like you want combine the pdf retriever chain with another API chain? If I understand you correctly, I think you could do a sequential chain or write your own logic to combine these two chains together.
@emmanuelkolawole6720 Год назад
Can you set this up with the vicuna AI model? That is the true test. Because not everyone wants to send their data to openAi
@SophiaYangDS Год назад
You can use llama.ccp with LangChain
@davidzhang4825 Год назад
Does this work with Data in Excel or Google Sheet ?
@SophiaYangDS Год назад ⁺¹
Yes LangChain has a CSV document loader and a GCS document loader. You can try those
@dribbens91 Год назад
how do i know whats my API ?!
@SophiaYangDS Год назад
You can get your API key from the OpenAI website
@mohsenghafari7652 9 месяцев назад
hi. please help me. how to create custom model from many pdfs in Persian language? tank you.
@rorororo-z8l Год назад
the app it doesn't work , i don't know why ?
@SophiaYangDS Год назад
I just tried again. It works for me. Did you set up billing at OpenAI? The OpenAI API only works when the billing is set up. It's also possible when many people tried the app at the same time, it just crashed
@MR_GREEN1337 Год назад ⁺¹
but again, ope ai api key isn't free, so maybe deploying this publicly would cost your a lot of money
@JCSantiago Год назад
Anyone else following along? I tried to run the code but got the error when I try to run panel serve LangChain_QA_Panel_App.ipynb
LangChain_QA_Panel_App.ipynb", line 7, in
"metadata": {},
^^^^^^^^^^^
NameError: name 'get_ipython' is not defined
@vinitshah8309 Год назад
I got the same error, have you found the solution ?
@AI-Consultant Год назад
whats your thoughts on dolly 2.0
@code4AI Год назад ⁺¹
smile. that is is finally open source with a CC BY-SA 3.0 license for commercial integration. And closed-source Llama is history.

Следующие

Автовоспроизведение

The Story of ChatGPT's creator OpenAI | From Riches to Fame